Skip to content

Conversation

@kakra
Copy link
Owner

@kakra kakra commented Dec 15, 2025

Export patch series: https://github.com/kakra/linux/pull/44.patch

Gaming und Desktop Interactivity Patches

This is a combined patchset from various sources, including some ports of old legacy Proton features (deprecated). The patchset mostly focuses around memory management and interactivity, optimizing the system for responsiveness rather than maximum throughput, to reduce stutters, memory thrashing and frame drops. Most unrelated ZEN patches have been dropped.

Changes since 6.12 LTS

  • ntsync: dropped because it's upstream now (you may need to add a udev rule ACTION=="add|change", KERNEL=="ntsync", SUBSYSTEM=="misc", MODE="0666" to set proper permissions)

Included Patches

  • Selected ZEN interactive patches: memory management, interactivity, gaming defaults, disk scheduler defaults
  • Threaded IRQs by default: cherry-picked from CK for better responsiveness
  • Prefer full idle SMT cores: prefers idle CPU cores for better throughput and better response times
  • BORE scheduler from CachyOS: ignores ZEN interactive if enabled (https://github.com/firelzrd/bore-scheduler)

Recommendation

Kernel 6.18 LTS seems to have worse performance characteristics for games compared to kernel 6.12, especially frame pacing can jitter a lot more, resulting in reduced perceived smoothness despite average FPS being identical or better. I found that I can fix it by building the kernel with support for sched_ext, and then running with the LAVD scheduler which also yields lower power consumption when idle (sys-app/kernel in Gentoo):

CONFIG_BPF=y
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_BPF_SYSCALL=y
CONFIG_DEBUG_INFO_BTF=y
CONFIG_FUNCTION_TRACER=y
CONFIG_KPROBE_EVENTS=y
CONFIG_SCHED_CLASS_EXT=y

Here's a talk on the technical details of the LAVD scheduler which initially has been made for the Steam Deck: https://www.youtube.com/watch?v=5F-vQgv4sI0


Deprecated

  • Soft dirty flag with reset: most likely only used by old Proton versions, will be dropped if issues show up, or with the next LTS
  • FUTEX wait multiple opcode 31: most likely only used by old Proton versions, will be dropped with the next LTS

dsd and others added 24 commits December 14, 2025 23:15
Contains:
  - PCI: Add Intel remapped NVMe device support

    Consumer products that are configured by default to run the Intel SATA AHCI
    controller in "RAID" or "Intel RST Premium With Intel Optane System
    Acceleration" mode are becoming increasingly prevalent.

    Unde this mode, NVMe devices are remapped into the SATA device and become
    hidden from the PCI bus, which means that Linux users cannot access their
    storage devices unless they go into the firmware setup menu to revert back
    to AHCI mode - assuming such option is available. Lack of support for this
    mode is also causing complications for vendors who distribute Linux.

    Add support for the remapped NVMe mode by creating a virtual PCI bus,
    where the AHCI and NVMe devices are presented separately, allowing the
    ahci and nvme drivers to bind in the normal way.

    Unfortunately the NVMe device configuration space is inaccesible under
    this scheme, so we provide a fake one, and hope that no DeviceID-based
    quirks are needed. The interrupt is shared between the AHCI and NVMe
    devices.

    Allow pci_real_dma_dev() to traverse back to the real DMA device from
    the PCI devices created on our virtual bus, in case the iommu driver
    will be involved with data transfers here.

    The existing ahci driver is modified to not claim devices where remapped
    NVMe devices are present, allowing this new driver to step in.

    The details of the remapping scheme came from patches previously
    posted by Dan Williams and the resulting discussion.

    https://phabricator.endlessm.com/T24358
    https://phabricator.endlessm.com/T29119

    Signed-off-by: Daniel Drake <drake@endlessm.com>

  - PCI: Fix order of remapped NVMe devices

Signed-off-by: Kai Krakow <kai@kaishome.de>
There's plenty of room on the stack for a few more inlined bytes here
and there. The measured stack usage at runtime is still safe without
this, and performance is surely improved at a microscopic level, so
remove it.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
From ClearLinux's own patches, disable both AVX2 and tree vectorization
when using O3 and higher than generic amd64 architectures.

Source: https://github.com/clearlinux-pkgs/linux/blob/main/0133-novector.patch
Signed-off-by: Kai Krakow <kai@kaishome.de>
ATA init is the long pole in the boot process, and its asynchronous.
move the graphics init after it so that ata and graphics initialize
in parallel

Signed-off-by: Kai Krakow <kai@kaishome.de>
Significant time was spent on synchronize_rcu in evdev_detach_client
when applications closed evdev devices. Switching VT away from a
graphical environment commonly leads to mass input device closures,
which could lead to noticable delays on systems with many input devices.

Replace synchronize_rcu with call_rcu, deferring reclaim of the evdev
client struct till after the RCU grace period instead of blocking the
calling application.

While this does not solve all slow evdev fd closures, it takes care of a
good portion of them, including this simple test:

	#include <fcntl.h>
	#include <unistd.h>

	int main(int argc, char *argv[])
	{
		int idx, fd;
		const char *path = "/dev/input/event0";
		for (idx = 0; idx < 1000; idx++) {
			if ((fd = open(path, O_RDWR)) == -1) {
				return -1;
			}
			close(fd);
		}
		return 0;
	}

Time to completion of above test when run locally:

	Before: 0m27.111s
	After:  0m0.018s

Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Per [Fedora][1], they intend to change the default max map count for
their distribution to improve OOTB compatibility with games played
through Steam/Proton.  The value they picked comes from the Steam Deck,
which defaults to INT_MAX - MAPCOUNT_ELF_CORE_MARGIN.

Since most ZEN and Liquorix users probably play games, follow Valve's
lead and raise this value to their default.

[1]: https://fedoraproject.org/wiki/Changes/IncreaseVmMaxMapCount

Signed-off-by: Kai Krakow <kai@kaishome.de>
Contains:
  - mm: Stop kswapd early when nothing's waiting for it to free pages

    Keeping kswapd running when all the failed allocations that invoked it
    are satisfied incurs a high overhead due to unnecessary page eviction
    and writeback, as well as spurious VM pressure events to various
    registered shrinkers. When kswapd doesn't need to work to make an
    allocation succeed anymore, stop it prematurely to save resources.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

  - mm: Don't stop kswapd on a per-node basis when there are no waiters

    The page allocator wakes all kswapds in an allocation context's allowed
    nodemask in the slow path, so it doesn't make sense to have the kswapd-
    waiter count per each NUMA node. Instead, it should be a global counter
    to stop all kswapds when there are no failed allocation requests.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

  - mm: Increment kswapd_waiters for throttled direct reclaimers

    Throttled direct reclaimers will wake up kswapd and wait for kswapd to
    satisfy their page allocation request, even when the failed allocation
    lacks the __GFP_KSWAPD_RECLAIM flag in its gfp mask. As a result, kswapd
    may think that there are no waiters and thus exit prematurely, causing
    throttled direct reclaimers lacking __GFP_KSWAPD_RECLAIM to stall on
    waiting for kswapd to wake them up. Incrementing the kswapd_waiters
    counter when such direct reclaimers become throttled fixes the problem.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

Signed-off-by: Kai Krakow <kai@kaishome.de>
This patch disabled the staggered spinup used for HDDs.

The goal is to make boot times faster on systems
with the small downside of a small spike in power consumption.

Systems with a bunch of HDDs would see considerable faster boots

This does make sense in the zen kernel as its supposed to be a kernel
specialized for desktop performance, and faster boot times does fit
into that description

Signed-off-by: Pedro Montes Alcalde <pedro.montes.alcalde@gmail.com>
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
In case of a multi-queue device, the code pointlessly loaded the
default elevator just to drop it again.

Signed-off-by: Kai Krakow <kai@kaishome.de>
Fall back straight to none instead of mq-deadline. Some benchmarks in a
[recent paper][1] suggest that mq-deadline has too much lock contention,
hurting throughput and eating CPU waiting for spinlocks.

[1]: https://research.spec.org/icpe_proceedings/2024/proceedings/p154.pdf

Signed-off-by: Kai Krakow <kai@kaishome.de>
Use [defer+madvise] as default khugepaged defrag strategy:

For some reason, the default strategy to respond to THP fault fallbacks
is still just madvise, meaning stall if the program wants transparent
hugepages, but don't trigger a background reclaim / compaction if THP
begins to fail allocations.  This creates a snowball affect where we
still use the THP code paths, but we almost always fail once a system
has been active and busy for a while.

The option "defer" was created for interactive systems where THP can
still improve performance.  If we have to fallback to a regular page due
to an allocation failure or anything else, we will trigger a background
reclaim and compaction so future THP attempts succeed and previous
attempts eventually have their smaller pages combined without stalling
running applications.

We still want madvise to stall applications that explicitely want THP,
so defer+madvise _does_ make a ton of sense.  Make it the default for
interactive systems, especially if the kernel maintainer left
transparent hugepages on "always".

Reasoning and details in the original patch: https://lwn.net/Articles/711248/

Signed-off-by: Kai Krakow <kai@kaishome.de>
5.7:
Take "sysctl_sched_nr_migrate" tune from early XanMod builds of 128. As
of 5.7, XanMod uses 256 but that may affect applications that require
timely response to IRQs.

5.15:
Per [a comment][1] on our ZEN INTERACTIVE commit, reducing the cost of
migration causes the system less responsive under high load.  Most
likely the combination of reduced migration cost + the higher number of
tasks that can be migrated at once contributes to this.

To better handle this situation, restore the mainline migration cost
value and also reduce the max number of tasks that can be migrated in
batch from 128 to 64.

If this doesn't help, we'll restore the reduced migration cost and keep
total number of tasks that can be migrated at once to 32.

[1]: zen-kernel@be5ba23#commitcomment-63159674

6.6:
Port the tuning to EEVDF, which removed a couple of settings.

6.7:
Instead of increasing the number of tasks that migrate at once, migrate
the amount acceptable for PREEMPT_RT, but reduce the cost so migrations
occur more often.

This should make CFS/EEVDF behave more like out-of-tree schedulers that
aggressively use idle cores to reduce latency, but without the jank
caused by rebalancing too many tasks at once.

Signed-off-by: Kai Krakow <kai@kaishome.de>
4.10:
During some personal testing with the Dolphin emulator, MuQSS has
serious problems scaling its frequencies causing poor performance where
boosting the CPU frequencies would have fixed them.  Reducing the
up_threshold to 45 with MuQSS appears to fix the issue, letting the
introduction to "Star Wars: Rogue Leader" run at 100% speed versus about
80% on my test system.

Also, lets refactor the definitions and include some indentation to help
the reader discern what the scope of all the macros are.

5.4:
On the last custom kernel benchmark from Phoronix with Xanmod, Michael
configured all the kernels to run using ondemand instead of the kernel's
[default selection][1].  This reminded me that another option outside of
the kernels control is the user's choice to change the cpufreq governor,
for better or for worse.

In Liquorix, performance is the default governor whether you're running
acpi-cpufreq or intel-pstate.  I expect laptop users to install TLP or
LMT to control the power balance on their system, especially when
they're plugged in or on battery.  However, it's pretty clear to me a
lot of people would choose ondemand over performance since it's not
obvious it has huge performance ramifications with MuQSS, and ondemand
otherwise is "good enough" for most people.

Lets codify lower up thresholds for MuQSS to more closely synergize with
its aggressive thread migration behavior.  This way when ondemand is
configured, you get sort of a "performance-lite" type of result but with
the power savings you expect when leaving the running system idle.

[1]: https://www.phoronix.com/scan.php?page=article&item=xanmod-2020-kernel

5.14:
Although CFS and similar schedulers (BMQ, PDS, and CacULE), reuse a lot
more of mainline scheduling and do a good job of pinning single threaded
tasks to their respective core, there's still applications that
confusingly run steady near 50% and benefit from going full speed or
turbo when they need to run (emulators for more recent consoles come to
mind).

Drop the up threshold for all non-MuQSS schedulers from 80/95 to 55/60.

5.15:
Remove MuQSS cpufreq configuration.

Signed-off-by: Kai Krakow <kai@kaishome.de>
This option is already disabled when CONFIG_PREEMPT_RT is enabled, lets
turn it off when CONFIG_ZEN_INTERACTIVE is set as well.

Signed-off-by: Kai Krakow <kai@kaishome.de>
What watermark boosting does is preemptively fire up kswapd to free
memory when there hasn't been an allocation failure. It does this by
increasing kswapd's high watermark goal and then firing up kswapd. The
reason why this causes freezes is because, with the increased high
watermark goal, kswapd will steal memory from processes that need it in
order to make forward progress. These processes will, in turn, try to
allocate memory again, which will cause kswapd to steal necessary pages
from those processes again, in a positive feedback loop known as page
thrashing. When page thrashing occurs, your system is essentially
livelocked until the necessary forward progress can be made to stop
processes from trying to continuously allocate memory and trigger
kswapd to steal it back.

This problem already occurs with kswapd *without* watermark boosting,
but it's usually only encountered on machines with a small amount of
memory and/or a slow CPU. Watermark boosting just makes the existing
problem worse enough to notice on higher spec'd machines.

Disable watermark boosting by default since it's a total dumpster fire.
I can't imagine why anyone would want to explicitly enable it, but the
option is there in case someone does.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Per an [issue][1] on the chromium project, swap-in readahead causes more
jank than not.  This might be caused by poor optimization on the
swapping code, or the fact under memory pressure, we're pulling in pages
we don't need, causing more swapping.

Either way, this is mainline/upstream to Chromium, and ChromeOS
developers care a lot about system responsiveness. Lets implement the
same change so Zen Kernel users benefit.

[1]: https://bugs.chromium.org/p/chromium/issues/detail?id=263561

Signed-off-by: Kai Krakow <kai@kaishome.de>
Partially overrides ZEN interactive adjustments if enabled.

Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
Link: https://aur.archlinux.org/packages/linux-cachyos-bore
When selecting an idle CPU for a task, always try to prioritize
full-idle SMT cores (CPUs belonging to an SMT core where all its sibling
are idle) over partially-idle cores.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Kai Krakow <kai@kaishome.de>
Add an option to wait on multiple futexes using the old interface, that
uses opcode 31 through futex() syscall. Do that by just translation the
old interface to use the new code. This allows old and stable versions
of Proton to still use fsync in new kernel releases.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
v2: ported from 6.1 to 6.6
v3: ported from 6.12 to 6.18

Signed-off-by: Kai Krakow <kai@kaishome.de>
v2: ported from 6.1 to 6.6

Signed-off-by: Kai Krakow <kai@kaishome.de>
@kakra kakra mentioned this pull request Dec 15, 2025
@kakra
Copy link
Owner Author

kakra commented Dec 25, 2025

Added note about performance characteristics to the top post:

Kernel 6.18 LTS seems to have worse performance characteristics for games compared to kernel 6.12, especially frame pacing can jitter a lot more, resulting in reduced perceived smoothness despite average FPS being identical or better. I found that I can fix it by building the kernel with support for sched_ext, and then running with the LAVD scheduler which also yields lower power consumption when idle (sys-app/kernel in Gentoo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.